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Abstract 

Recently it has become clear that many technologies 
follow a generalized version of Moore’s law, i.e. costs 
tend to drop exponentially, at different rates that de¬ 
pend on the technology. Here we formulate Moore’s 
law as a correlated geometric random walk with drift, 
and apply it to historical data on 53 technologies. 
We derive a closed form expression approximating the 
distribution of forecast errors as a function of time. 
Based on hind-casting experiments we show that this 
works well, making it possible to collapse the forecast 
errors for many different technologies at different time 
horizons onto the same universal distribution. This is 
valuable because it allows us to make forecasts for any 
given technology with a clear understanding of the 
quality of the forecasts. As a practical demonstration 
we make distributional forecasts at different time hori¬ 
zons for solar photovoltaic modules, and show how our 
method can be used to estimate the probability that a 
given technology will outperform another technology 
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1 Introduction 

Technological progress is widely acknowledged as the 
main driver of economic growth, and thus any method 
for improved technological forecasting is potentially 
very useful. Given that technological progress de¬ 
pends on innovation, which is generally thought of 
as something new and unanticipated, forecasting it 
might seem to be an oxymoron. In fact there are sev¬ 
eral postulated laws for technological improvement, 
such as Moore’s law and Wright’s law, that have been 
used to make predictions about technology cost and 
performance. But how well do these methods work? 

Predictions are useful because they allow us to plan, 
but to form good plans it is necessary to know prob¬ 
abilities of possible outcomes. Point forecasts are of 
limited value unless they are very accurate, and when 
uncertainties are large they can even be dangerous if 
they are taken too seriously. At the very least one 
needs error bars, or better yet, a distributional fore¬ 
cast, estimating the likelihood of different future out¬ 
comes. Although there are now a few papers testing 
technological forecast^ there is as yet no method that 

^ S ee e.g. lAlchianI lll963tl . lAlberthI ll2008h . iNagy etld] 
(l2013l l test the relative accuracy of different methods of fore- 
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gives distributional forecasts based on an empirically 
validated stochastic process. In this paper we remedy 
this situation by deriving the distributional errors for 
a simple forecasting method and testing our predic¬ 
tions on empirical data on technology costs. To moti¬ 
vate the problem that we address, consider three tech¬ 
nologies related to electricity generation: coal mining, 
nuclear power and photovoltaic modules. Fig. [T] com¬ 
pares their long-term historical prices. Over the last 
150 years the inflation-adjusted price of coal has fluc¬ 
tuated by a factor of three or so, but shows no long 
term trend, and indeed from the historical time se¬ 
ries one cannot rejec t the null hypothesis of a random 
walk with no driffH (jMcNernev et al. 2011 1. Nuclear 
power and solar photovoltaic electricity, in contrast, 
are both new technologies that emerged at roughly the 
same time. The first commercial nuclear power plant 
opened in 1956 and the first practical use of solar pho- 
tovoltaics was as a power supply for the Vanguard I 
satellite in 1958. The cost of electricity generated by 
nuclear power is highly variable, but has generally in¬ 
creased by a factor of two or three during the period 
shown here. In contrast, a watt of sola r photovoltaic 
capacity cost $256 in 1956 (Perlin 1999ll (about $1910 
in 2013 dollars) vs. $0.82 in 2013, dropping in price 
by a factor of about 2,330. Since 1980 photovoltaic 
modules have decreased in cost at an average rate of 
about 10% per year. 

In giving this example we are not trying to make a 
head-to-head comparison of the full system costs for 
generating electricity. Instead we are comparing three 
different technologies, coal mining, nuclear power 
and photovoltaic manufacture. Generating electric¬ 
ity with coal requires plant construction (whose his¬ 
torical cost has dropped considerably since the first 
plants came online at the beginning of the 20th cen¬ 
tury). Generating electricity via solar photovoltaics 
has balance of system costs that have not dropped as 
fast as that of modules in recent years. Our point 


casting statistically but do not produce and test a distributional 
estimate of for ecast rel i abilit y for any particular method. Mc- 
Crory, cited in ljants^ lll967l l. assumes a Gaussian distribution 
and uses this to calculate the probability that a targeted level 
of progress be met at a given horizon. Here we assume and test 
a Gaussian distribution for the natural log. 

^ To drive home the point that fossil fuels show no long term 
trend of dropping in cost, after adjusting for inflation coal now 
costs about what it did in 1890, and a similar statement applies 
to oil and gas. 


here is that different technologies can decrease in cost 
at very different rates. 

Predicting the rate of technological improvement 
is obviously very useful for planning and investment. 
But how consistent are such trends? In response to a 
forecast that the trends above will continue, a skeptic 
would rightfully respond, “How do we know that the 
historical trend will continue? Isn’t it possible that 
things will reverse, and over the next 20 years coal 
will drop in price dramatically and solar will go back 
up?". 

Our paper provides a quantitative answer to this 
question. We put ourselves in the past, pretend we 
don’t know the future, and use a simple method to 
forecast the costs of 53 different technologies. Ac¬ 
tually going through the exercise of making out-of- 
sample forecasts rather than simply doing in-sample 
regressions has the essential advantage that it fully 
mimics the process of making forecasts and allows us 
to say precisely how well forecasts would have per¬ 
formed. Out-of-sample testing such as we do here is 
particularly important when models are mis-specified, 
which one expects for a complicated phenomenon such 
as technological improvement. 

We show how one can combine the experience from 
forecasting many technologies to make reliable distri¬ 
butional forecasts for a given technology. For solar PV 
modules, for example, we can say, “Based on experi¬ 
ence with many other technologies, the probability is 
roughly 5% that in 2030 the price of solar PV modules 
will be greater than or equal to their current (2013) 
price". We can assign a probability to different price 
levels at different points in the future, as is done later 
in Fig. [To] (where we show that very likely the price 
will drop significantly). We can also compare differ¬ 
ent technologies to assess the likelihood of different 
future scenarios for their relative prices, as is done in 

Fig. nn 

Technological costs occasionally experience struc¬ 
tural breaks where trends change. Indeed there are 
several clear examples in our historical data, and al¬ 
though we have not explicitly modeled this, their ef¬ 
fect on forecast errors is included in the empirical 
analysis we have done here. The point is that, while 
such structural breaks happen, they are not so large 
and so common as to over-ride our ability to forecast. 
Every technology has its own story, its own specific 
set of causes and effects, that explain why costs went 
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up or down in any given year. Nonetheless, as we 
demonstrate here, the long term trends tend to be 
consistent, and can be captured via historical time 
series methods with no direct information about the 
underlying technology-specific stories. 


o photovoltaic module prices 
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Figure 1: A comparison of long-term price trends for coal, 
nuclear power and solar photovoltaic modules. Prices for 
coal and nuclear power are costs in the US in dollars per 
kilowatt hour (scale on the left) whereas solar modules 
are in dollars per watt-peak, i.e. the cost for the capacity 
to generate a watt of electricity in full sunlight (scale on 
the right). For coal we use units of the cost of the coal 
that would need to be burned in a modern US plant if 
it were necessary to buy the coal at its inflation-adjusted 
price at different points in the past. Nuclear prices are 
Busbar costs for US nuclear plants in the y ear in which 
they became operational (from Cooped ( 2009!) '). The align¬ 
ment of the left and right vertical axes is purely sugges¬ 
tive; based on recent estimates of levelized costs, we took 
$0.177/kWh = $0.82/Wp in 2013 (2013$). The number 
$0.177/kWh is a global value produced as a projection 
for 2013 by the International Energ y Agency (Table 4 in 
International Energy Agency! ( 20141 1'). We no te that it is 
compa tible with es timated values (Table 1 in Baker_et al.l 


IM), Fig. 4 in linternational Energy Agency! (120141) '). 


The red cross is the agreed price for the planned UK Nu¬ 
clear power plant at Hinkley Point which is scheduled to 
come online in 2023 (£0.0925 ~ $0.14). The dashed line 
corresponds to an earlier target of $0.05/kWh set by the 
the U.S. Department of Energy. 


In this paper we use a very simple approach to fore¬ 
casting that was originally motivated by Moore’s Law. 
As everyone knows, Intel’s ex-CEO, Gordon Moore, 


famously predicted that the number of transistors on 
integrated circuits would double every two years, i.e. 
at an annual rate of about 40%. Making transistors 
smaller also brings along a variety of other benehts, 
such as increased speed, decreased power consump¬ 
tion, and less expensive manufacture costs per unit 
of computation. As a result it quickly became clear 
that Moore’s law applies more broadly, for example, 
implying a doubling of computational speed every 18 
months. 

Moore’s law stimulated others to look at related 
data more carefully, and they discovered that expo¬ 
nential improvement is a reasonable approximation 
for other types of computer hardware as well, such 
as hard drives. Since the performance of hard drives 
depends on physical factors that are unrelated to tran¬ 
sistor density this is an independent fact, though of 
course the fact that mass storage is essential for com¬ 
putation causes a tight coupling between the two tech¬ 
nologies. Lienhard, Koh and Magee, and other^ ex¬ 
amined data for other products, including many that 
have nothing to do with computation or information 
processing, and postulated that exponential improve¬ 
ment is a much more general phenomenon that applies 
to many different technologies, even if in most cases 
the exponential rates are much slower. 

Although Moore’s law is traditionally applied as a 
regression of the log of the cost on a deterministic 
time trend, we reformulate it here as a geometric ran¬ 
dom walk with drift. This has several advantages. 
On average it results in more accurate forecasts, es¬ 
pecially at short horizons, indicating that it is indeed 
a better model. In addition, this allows us to use 
standard results from the time series forecasting liter- 
atur^. The technology time series in our sample are 


® Examples include 

LienhardI ll2006ll. iKoh & Maeed 

(l2006l. 2008l'l. Bailev et all ( 

2 OI 2 I'). Benson & Maeed ( 

2014Jn. 


lonR spans of time indicate sup er-exponential improvement 
(lNordhauEl2007l . lNagv et al.ll2011I) . suggesting that Moore’s law 
may only be an approximation rea.sonabl y valid over spans of 
time of 50 years or less. See also e.g. iFunkI (l2013l ~l for an 
explanation of Moor e’s law based on geometric scaling, and 
iFunk fc Maged (l2014l l for empirical evidence regarding fast im¬ 
provement prior to large production increase. 

^ Several methods have been defined to obtain pred iction 
intervals, i.e. error bars for the forecasts (IChatfieldlll993l l. The 
classical Box-Jenkins methodology for ARIMA processes uses 
a theoretical formula for the variance of the process, but does 
not account for uncertainty due to parameter estimates. An- 
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typically rather short, often only 15 or 20 points long, 
so to test hypotheses it is essential to pool the data. 
Because the geometric random walk is so simple it is 
possible to derive formulas for the forecast errors in 
closed form. This makes it possible to estimate the 
forecast errors as a function of both sample size and 
forecasting horizon, and to combine data from many 
different technologies into a single analysis. This al¬ 
lows us to get highly statistically significant results. 
And most importantly, because this allows us to sys¬ 
tematically test the method on data for many differ¬ 
ent technologies, this allows us to make distributional 
forecasts for a single technology and have confidence 
in the results. 

Motivated by structure we find in the data, we fur¬ 
ther extend Moore’s law to allow for the possibility 
that changes in price are positively autocorrelated 
in time. We assume that the logarithm of the cost 
follows a random walk with drift and autocorrelated 
noise, more specifically an Integrated Moving Aver¬ 
age process of order (1,1), i.e. an IMA(1,1) model. 
Under the assumption of sufficiently large autocorre¬ 
lation this method produces a good fit to the empiri¬ 
cally observed forecasting errors. We derive a formula 
for the errors of this more general model, assuming 
that all technologies have the same autocorrelation 
parameter and the forecasts are made using the sim¬ 
ple random walk model. We use this to forecast the 
likely distribution of the price of photovoltaic solar 
modules, and to estimate the probability that solar 
modules will undercut a competing technology at a 
given date in the future. 

We want to stress that we do not mean to claim 
that the generalizations of Moore’s law explored here 
provide the most accurate possible forecasts for tech¬ 
nological progress. There is a large literature on ex¬ 
perience curved, studying the relationship between 
cost and cu mulat ive production originally suggested 
by Wright (|l936l b and many authors have proposed 


other approach is to use the empirical forecast errors to es¬ 
timate the distribution of forecast errors. In this case, one 
can use either t he in- sample errors (the residuals, as in e.g. 
iTavlor k. BunnI ( 1999lll . or the out-of-sample fore cast errors 
( Williams fc Goodma ^ I 1971 I . iLee k. Scholted l20l4l . Several 
studies have found that usi ng residuals leads to predicti on in- 
terv als whic h are t o o tight dMakridakis fc Winklen 1989ll. 

Arrow d 1962 1. Alchian~lll963tl. Argote fc Eppld lil990h . 


iDutton fc Thoni^ lll984l ). iThomps^ 1 20121 ). 


alternatives and generalization^. iNagv et al. ( 2013 1 
tested these alternatives using a data set that is very 
close to ours and found that Moore’s and Wright’s 
laws were roughly tied for first place in terms of their 
forecasting performance. An important caveat is that 
Nagy et al.’s study was based on a trend stationary 
model, and as we argue here, the difference station¬ 
ary model is superior, both for forecasting and for 
statistical testing. It seems likely that methods using 
auxiliary data such as production, patent activity, or 
R&D can be used to make forecasts for technological 
progress that incorporate more factors, and that such 
methods should yield improvements over the simple 
method we use her^. 

The key assumption made here is that all technolo¬ 
gies follow the same random process, even if the drift 
and volatility parameters of the random process are 
technology specific. This allows us to develop distri¬ 
butional forecasts in a highly parsimonious manner 
and efficiently test them out of sample. We restrict 
ourselves to forecasting unit cost in this paper, for 
the simple reason that we have data for it and it is 
comparable across different technologies. The work 
presented here provides a simple benchmark against 
which to compare forecasts of future technological 
performance based on other methods. 

The approach of basing technological forecasts on 
historical data that we pursue here stands in sharp 
contrast to the most widely used method, which is 
based on expert opinions. The use of expert opin¬ 
ions is clearly valuable, and we do not suggest that 
it should be supplanted, but it has several serious 
drawbacks. Expert opinions are subjective and can 
be biased for a variety of reasons (j AlbrightI l2002l l . in¬ 
cluding common information, herding, or vested in¬ 
terest. Forecasts for the costs of nuclear power in the 
US, for example, were for several d ecades consist ently 
low by roughly a factor of three ( Cooper 2009f ). A 
second problem is that it is very hard to assess the 
accuracy of expert forecasts. In contrast the method 
we develop here is objective and the quality of the 
forecasts is known. Nonetheless we believe that both 
methods are valuable and that they should be used 


^ See GoddardI (Il982ll , ISinclair et all (l2000l l , Ijamasbl ll200l1 i , 
iNordhausI 1 20091 b 

^ See for example iBenson fc Mag^ ll2014d l for an example 
of how patent data can be used to explain variation in rates of 
improvement among different technologies. 
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side-by- sidell 

The remainder of the paper develops as follows: In 
Section [2] we derive the error distribution for forecasts 
based on the geometric random walk as a function 
of time horizon and other parameters and show how 
the data for different technologies and time horizons 
should be collapsed. We also show how this can be 
generalized to allow for autocorrelations in the data 
and derive similar (approximate) formulas. In Sec¬ 
tion [3] we describe our data set and present an em¬ 
pirical relationship between the variance of the noise 
and the improvement rate for different technologies. 
In Section U] we describe our method of testing the 
models against the data, and present the results in 
Section [5j We then apply our method to give a distri¬ 
butional forecast for solar module prices in Section |6] 
and show how this can be used to forecast the likeli¬ 
hood that one technology will overtake another. Fi¬ 
nally we give some concluding remarks in Section [71 
A variety of technical results are given in the appen¬ 
dices. 


2 Models 


2.1 Geometric random walk 


In this section we discuss how to formulate Moore’s 
law in the presence of noise and argue that the best 
method is the geometric random walk with drift. We 
then present a formula for the distribution of expected 
errors as a function of the time horizon and the other 
parameters of the model, and generalize the formula 
to allow for autocorrelation in the data generating 
process. This allows us to pool the errors for many 
different technologies. This is extremely useful be¬ 
cause it makes it possible to test the validity of these 
results using many short time series (such as the data 
we have here). 

The generalized version of Moore’s law we study 
here is a postulated relationship which in its deter¬ 
ministic form is 

Pt = Poe'"*, 


where pt is either the unit cost or the unit price of a 
technology at time t; we will hereafter refer to it as 


“For additional discussion of the advantages a nd drawback s 
of different met hods of technology forecasting, se e Ajre^ lll969h , 
iMartinol lll993ll and [National Research Councill ll2009l ') 


the cost, pq is the initial cost and p is the exponential 
rate of change. (If the technology is improving then 
p < 0.) In order to ht this to data one has to allow 
for the possibility of errors and make an assumption 
about the structure of the errors. Typically the litera¬ 
ture has treated Moore’s law using a trend stationary 
model, minimizing squared errors to ht a model of the 
form 

yt = yo + pt + et, (1) 

where yt = \og{pt). From the point of view of the 
regression, is the intercept, p is the slope and et is 
independent and identically distributed (IID) noise. 

But records of technological performance such as 
those we study here are time series, giving the costs pjt 
for technology j at successive times t = 1,2,... ,rj. 
It is therefore more natural to use a time series model. 
The simplest possible choice that yields Moore’s law 
in the deterministic limit is the geometric random 
walk with drift, 


yt = yt-i + p + nt. (2) 

As before p is the drift and n* is an IID noise process. 
Letting the noise go to zero recovers the determinis¬ 
tic version of Moore’s law in either case. When the 
noise is nonzero, however, the models behave quite 
differently. For the trend stationary model the shocks 
are purely transitory, i.e. they do not accumulate. In 
contrast, if t/q is the cost at time t = 0, Eq. ([2]) can 
be iterated and written in the form 

t 

yt = yo + Ait + ^ni. (3) 

i=l 

This is equivalent to Eq. © except for the last term. 
While in the regression model of Eq. ([1]) the value of 
yt depends only on the current noise and the slope 
p, in the random walk model (Eq. [2]) it depends 
on the sum of previous shocks. Hence shocks in the 
random walk model accumulate and the forecasting 
errors grow with time horizon as one would expect, 
even if the parameters of the model are perfectly es¬ 
timated]^. 

For time series models a key question is whether 
the process has a unit root. Most of our time series 

® iNagy et al.l (l2013l l used trend stationary models to study 
a similar dataset. Their short term forecasts were on average 
less accurate and they had to make ad hoc assumptions to pool 
data from different horizons. 
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are much too short for unit root tests to be effective 
(|Bloughill992l ). Nonetheless, we found that our time 
series forecasts are consistent with the hypothesis of 
a unit root and that they perform better than several 
alternatives. 


2.2 Prediction of forecast errors 

We now derive a formula for the forecast errors of the 
geometric random walk as a function of time horizon. 
We assume that all technologies follow the geometric 
random walk, i.e. our noisy version of Moore’s law, 
but with technology-specific parameters. Rewriting 
Eq. ([2]) slightly, it becomes 

Vjt = + y-j + 

where the index j indicates technology j. For con¬ 
venience we assume that noise rijt is IID normal, i.e. 
Ujt ~ AA(0,/f?). This means that technology j is 
characterized by a drift fij and the standard devia¬ 
tion of the noise increments Kj. We will typically not 
include the indices for the technology unless we want 
to emphasize the dependence on the technology. 

We now derive the expected error distribution for 
Eq. ([2]) as a function of the time horizon r. Eq. Q 
implies that 


t+T 

yt+T = yt + fiT+ ^ Hi. (4) 

i=t+l 

The point forecast r steps ahead i J^ 

yt+T = yt + yr, (5) 

where ft is the estimated /r. The forecast error is de¬ 
fined as 

£ ~ yt+T yt+T- ( 6 ) 


Putting Eqs. (|3]) and ([5]) into Eq. 


gives 


t+T 

8 = - (j) + ^ Ui, 

i=t+l 


( 7 ) 


which separates the error into two parts. The first 
term is the error due to the fact that the mean is an 
estimated parameter and the second term represents 
the error due to th e fact that unpr edictable random 
shocks accumulate (ISampsonI Il99ll ). Assuming that 
the noise increments are i.i.d normal and that the es¬ 
timation of the parameters is based on a trailing sam¬ 
ple of m data points, in Appendix lB.il we derive the 
scaling of the errors with m, r and K, where is 
the estimated variance. 

Because we want to aggregate forecast errors for 
technologies with different volatilities, to study how 
the errors grow as a function of r we use the nor¬ 
malized mean squared forecast error H(t). Assuming 
m > 3 it is 


■ (r) = E 


m — 1 
m — 3 


r -I- 


m 


( 8 ) 


where E represents the expectation. 

This formula makes intuitive sense. The diffusion 
term r is due to the accumulation of noisy fluctuations 
through time. This term is present even in the limit 
m —>■ oo, where the estimation is perfect. The r^/m 
term is due to estimation error in the mean. The 
need to estimate the variance causes the prefactoif^ 
(m — 1 )/(m — 3) and also means that the distribution 
is Student t rather than normal, i.e. 


e = 


with 


A = T jm. 


(9) 

( 10 ) 


The point forecast is the expected logarithm of the cost for 
the random walk with drift model, E[yt+T]- We assume yt+T 
is normally distributed. This means the cost is log-normally 
distributed and the forecast of the median cost is Be¬ 

cause the mean of a log-normal distribution also depends on the 
variance of the underlying normal distribution, the expected 
cost diverges when r ^ oo due to parameter uncertainty. Our 
forecasts here are for the median cost. This has the important 
advantage that (unlike the mean or the mode) it does not re¬ 
quire an estimate of the variance, and is therefore simpler and 
more robust. 


Eq. ([9]) is universal in the sense that the right hand 
side is independent of jlj, Kj, and r. It depends nei¬ 
ther on the properties of the technology nor on the 

The pref actor is significan tly different from one only when 
m Is small. ISampsonI lll99lll derived the same formula but 
without the pre factor since he worked with the true variance. 
ISampsonI (Il99ll l also showed that the square term due to error 
in the estimation of the drift exists for the regression on a time 
trend model, and for mor e general noise processes. See also 
IClements fc HendrvI (1200 il l. 
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time horizon. As a result we can pool forecast errors 
for different technologies at different time horizons. 
This property is extremely useful for statistical test¬ 
ing and can also be used to construct distributional 
forecasts for a given technology. 

2.3 Generalization for autocorrelation 

We now generalize the formula above to allow for au¬ 
tocorrelations in the error terms. Although the uncor¬ 
related random walk model above does surprisingly 
well, there is good evidence that there are positive 
autocorrelations in the data. In order to incorpo¬ 
rate this structure we extend the results above for 
an ARIMA(0,1,1) (autoregressive integrated moving 
average) model. The zero indicates that we do not 
use the autoregressive part, so we will abbreviate this 
as an IMA(1,1) model in what follows. The IMA(1,1) 
model is of the form 


yt-yt-i = fJ. + vt + 6vt-i, (11) 


see e.g. Box &: JenkinsI ( 197Clh . The relevant formulas 
for this case are derived in Appendix IB.21 We make 
the same point forecasts as before given by Eq. ([5|). 
If the variance is known the distribution of forecast 


errors is 




( 12 ) 


with 


A* 


-29 + 



2{m-l)e 

m 




■ ( 13 ) 


Note that we recover Eq. (Uni when 9 = 0. In the 
usual case where the variance has to be estimated, 
we derive an approximate formula for the growth and 
distribution of the forecast errors by assuming that K 
and £ are independent. The expected mean squared 
normalized error is 


H(r) = E 



m-1 A* 
m — 3 1 -|- 02 ’ 


(14) 


with the noise vt ~ M{0, a‘^). This model is also a ge¬ 
ometric random walk, but with correlated increments 
when 0 7^ 0 (the autocorrelations of the time series 
are positive when 0 > 0). 

We chose this model rather than other alternatives 
mainly for its simplicitjQ. Moreover, our data are 
often time-aggregated, that is, our yearly observa¬ 
tions are averages of the observed costs over the year. 
It has been shown that if the true process is a ran¬ 
dom walk with drift then aggregation ca n lead to sub¬ 
stantial autocorrelation ( Working 196fll ). In any case, 
while every technology certainly follows an idiosyn¬ 
cratic pattern and may have a complex autocorrela¬ 
tion structure and specific measurement errors, using 
the IMA (1,1) as a universal model allows us to par¬ 
simoniously understand the empirical forecast errors 
and generate robust prediction intervals. 

A key quantity for pooling the data is the variance, 
which by analogy with the previous model we call K 
for this model as well. It is easy to show that 


K'^ = var{yt - yt-i) = var{vt + 9vt-i) = (1 + 9‘^)a‘^, 

Our individual time series are very short, which makes 
it very difficult to find the proper order of differencing and 
to distinguish between different ARMA models. For instance, 
slightly different ARIMA models such as (1,1,0) are far from 
implausible for many technologies. 


and the distribution of rescaled normalized forecast 
errors is 

e* = ^ ~ t{m — 1). (15) 

^AV(l + 02) \kJ 

These formulas are only approximations so we com¬ 
pare them to more exact results obtained through sim¬ 
ulations in Appendix IB.21 ~ see in particular Eig. [12] 
Eor m > 30 the approximation is excellent, but there 
are discrepancies for small values of m. 

As before the right hand side is independent of all 
the parameters of the technology as well as the time 
horizon. Eq. (1151) can be viewed as the distribution of 
errors around a point forecast, which makes it possible 
to collapse many technologies onto a single distribu¬ 
tion. This property is extremely useful for statistical 
testing, i.e. for determining the quality of the model. 
But its greatest use, as we demonstrate in Section[6l is 
that it makes it possible to formulate a distributional 
forecast for the future costs of a given technology. 

When m is sufficiently large the Student t distri¬ 
bution is well-approximated by a standard normal. 
Using the mean given by Eq. ([5|) and the variance de¬ 
termined by Eqs. (mni), the distributional forecast 
for the future logarithm of the cost yt+r conditioned 


7 


























on {yt,...,yt-m+i) iS 

yt+r-^^f{yt + ^^r,K^A*/{l + e^)). (16) 

We will return later to the estimation of 0. 

2.4 Alternative hypotheses 

In addition to autocorrelation we investigated other 
ways to generalize the model, such as heavy tails and 
long-memory. As discussed in Appendix lC.41 based on 
forecast errors we found little evidence for heavy tails. 
Long-memory is in a sense an extreme version of the 
autocorrelation hvpothesiJ^. which produces errors 
that grow faster as a function of the forecasting hori¬ 
zon r than a random walk. Given that long-memory 
is a natural result of nonstationarity, which is com¬ 
monly associated with technological change, our prior 
was that it was a highly plausible alternative. How¬ 
ever, as we will see, the geometric random walk with 
normal noise increments and autocorrelations seems 
to give good agreement for the time scaling of fore¬ 
casting errors, so we did not investigate long-memory 
further. 


3 Data 

3.1 Data collection 

The bulk of our data on technology costs comes 
from the Santa Fe Institute’s Performance Curve 
DataBas^il, which was originally developed by Bela 
Nagy and collaborators; we augment it with a few 
other datasets. These data were collected via litera¬ 
ture search, with the principal criterion for selection 
being availability. Fig. [2] plots the time series for each 
data set. The motley character of our dataset is clear: 

Note that although we make the estimate of the variance 
^-dependent, we always use the estimate of the mean corre¬ 
sponding to 0 = 0. We do this because this is simpler and more 
robust. 

A process has long-memory if the autocorrelation function 
of its increments is not integrable. Under the long-memory 
hypothesis one expects the diffusion term of the normalized 
squared errors to scale as H(t) ~ where H is the Hurst 
exponent. In the absence of long-memory H = 1/2, but for 
long-memory 1/2 < H < 1. Long-memory can arise from many 
causes, including nonstationarity. It is easy to construct plau¬ 
sible processes with the p parameter varying where the mean 
squared errors grow faster than r^. 

^®pcdb.santafe.edu 


The time series for different technologies are of differ¬ 
ent lengths and they start and stop at different times. 
The sharp cutoff for the chemical data, for example, 
reflects the fact that it comes from a book published 
by the Boston Consulting Group in 19721 . Table [T] 
gives a summary of the properties of the data and 
more description of the sources can be found in Ap¬ 
pendix [Al This plot also makes it clear that technolo¬ 
gies improve at very different rates. 



Figure 2: Cost vs. time for each technology in our dataset. 
This shows the 53 technologies out of the original set of 
66 that have a significant rate of cost improvement (DNA 
sequencing is divided by 1000 to fit on the plot; the y-axis 
is in log scale). More details can be found in Tableland 
Appendix [A] 

A ubiquitous problem in forecasting technological 
progress is finding invariant units. A favorable exam¬ 
ple is electricity. The cost of generating electricity can 
be measured in dollars per kWh, making it possible 
to sensibly compare competing technologies and mea¬ 
sure their progress through time. Even in this favor¬ 
able example, however, making electricity cleaner and 
safer has a cost, which has affected historical prices 
for technologies such as coal and nuclear power in re¬ 
cent years, and means that their costs are difficult to 
compare to clean and safe but intermittent sources of 
power such as solar energy. To take an unfavorable 
example, our dataset contains appliances such as tele¬ 
vision sets, that have dramatically increased in quality 
through timJ^. Yet another problem is that some of 

lOordonl ijlQQtll l provides quality change adjustments for a 
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them are potentially subject to scarcity constraints, 
which might potentially introduce additional trends 
and fluctuations. 

One should therefore regard our results here as a 
lower bound on what is possible, in the sense that 
performing the analysis with better data in which all 
technologies had invariant units would very likely im¬ 
prove the quality of the forecasts. We would love to 
be able to make appropriate normalizations but the 
work involved is prohibitive; if we dropped all ques¬ 
tionable examples we would end with little remaining 
data. Most of the data are costs, but in a few cases 
they are prices; again, this adds noise but if we were 
able to be consistent that should only improve our re¬ 
sults. We have done various tests removing data and 
the basic results are not sensitive to what is included 
and what is omitted (see Fig. [I4]in the appendix). 

We have removed some technologies that are too 
similar to each other from the Performance Curve 
Database. For instance, when we have two datasets 
for the same technology, we keep only one of them. 
Our choice was based on data quality and length of 
the time series. This selection left us with 66 tech¬ 
nologies belonging to different sectors that we label 
as chemistry, genomics, energy, hardware, consumer 
durables and food. 

3.2 Data selection and descriptive statis¬ 
tics 

In this paper we are interested in technologies that 
are improving, so we restrict our analysis to those 
technologies whose rate of improvement is statistically 
significant based on the available sample. We used 
a simple one-sided t-test on the first-difference (log) 
series and removed all technologies for which the p- 
value indicates that we can’t reject the null that pj = 
0 at a 10% confidence levels. 

Tabled] reports the p-values for the one sided t-tests 
and the bottom of the table shows the technologies 
that are excluded as a result. Tabled) also shows the 
estimated drift fij and the estimated standard devia¬ 
tion Kj based on the full sample for each technology 
j. (Throughout the paper we use a hat to denote esti¬ 
mates performed within an estimation window of size 

number of durable goods. These methods (typically hedonic 
regressions) require additional data. 

This is under the assumption that 6 = 0. 


Technology 

Industry 

T 

A 

p value 

K 

§ 

Transistor 

Hardware 

38 

-0.50 

0.00 

0.24 

0.19 

Geothermal.Electricity 

Energy 

26 

-0.05 

0.00 

0.02 

0.15 

Milk..US. 

Food 

79 

-0.02 

0.00 

0.02 

0.04 

DRAM 

Hardware 

37 

-0.45 

0.00 

0.38 

0.14 

Hard.Disk.Drive 

Hardware 

20 

-0.58 

0.00 

0.32 

-0.15 

Automotive ..US. 

Cons. Goods 

21 

-0.08 

0.00 

0.05 

1.00 

Low. Density. Polyethylene 

Chemical 

17 

-0.10 

0.00 

0.06 

0.46 

Polyvinylchloride 

Chemical 

23 

-0.07 

0.00 

0.06 

0.32 

Ethanolamine 

Chemical 

18 

-0.06 

0.00 

0.04 

0.36 

Concentratmg.Solar 

Energy 

26 

-0.07 

0.00 

0.07 

0.91 

AcrylicFiber 

Chemical 

13 

-0.10 

0.00 

0.06 

0.02 

Styrene 

Chemical 

15 

-0.07 

0.00 

0.05 

0.74 

Titanium. Sponge 

Chemical 

19 

-0.10 

0.00 

0.10 

0.61 

VinylChloride 

Chemical 

11 

-0.08 

0.00 

0.05 

-0.22 

Photovoltaics 

Energy 

34 

-0.10 

0.00 

0.15 

0.05 

PolyethyleneHD 

Chemical 

15 

-0.09 

0.00 

0.08 

0.12 

VinylAcetate 

Chemical 

13 

-0.08 

0.00 

0.06 

0.33 

Cyclohexane 

Chemical 

17 

-0.05 

0.00 

0.05 

0.38 

BisphenolA 

Chemical 

14 

-0.06 

0.00 

0.05 

-0.03 

Monochrome. Television 

Cons. Goods 

22 

-0.07 

0.00 

0.08 

0.02 

PolyethylcneLD 

Chemical 

15 

-0.08 

0.00 

0.08 

0.88 

Laser.Diode 

Hardware 

13 

-0.36 

0.00 

0.29 

0.37 

PolyesterFiber 

Chemical 

13 

-0.12 

0.00 

0.10 

-0.16 

Caprolactam 

Chemical 

11 

-0.10 

0.00 

0.08 

0.40 

IsopropylAlcohol 

Chemical 

9 

-0.04 

0.00 

0.02 

-0.24 

Polystyrene 

Chemical 

26 

-0.06 

0.00 

0.09 

-0.04 

Polypropylene 

Chemical 

10 

-0.10 

0.00 

0.07 

0.26 

Pentaerythritol 

Chemical 

21 

-0.05 

0.00 

0.07 

0.30 

Ethylene 

Chemical 

13 

-0.06 

0.00 

0.06 

-0.26 

Wind. Turbine. .Denmaik. 

Energy 

20 

-0.04 

0.00 

0.05 

0.75 

Paraxylene 

Chemical 

12 

-0.10 

0.00 

0.09 

-1.00 

DNA. Sequencing 

Genomics 

13 

-0.84 

0.00 

0.83 

0.26 

NeopreneRubber 

Chemical 

13 

-0.02 

0.00 

0.02 

0.83 

Formaldehyde 

Chemical 

11 

-0.07 

0.00 

0.06 

0.36 

SodiumChlorate 

Chemical 

15 

-0.03 

0.00 

0.04 

0.85 

Phenol 

Chemical 

14 

-0.08 

0.00 

0.09 

-1.00 

Acrylonitrile 

Chemical 

14 

-0.08 

0.01 

0.11 

1.00 

Beer..Japan. 

Food 

18 

-0.03 

0.01 

0.05 

-1.00 

Prirnary.Magnesiurn 

Chemical 

40 

-0.04 

0.01 

0.09 

0.24 

Ammonia 

Chemical 

13 

-0.07 

0.02 

0.10 

1.00 

Aniline 

Chemical 

12 

-0.07 

0.02 

0.10 

0.75 

Benzene 

Chemical 

17 

-0.05 

0.02 

0.09 

-0.10 

Sodium 

Chemical 

16 

-0.01 

0.02 

0.02 

0.42 

Methanol 

Chemical 

16 

-0.08 

0.02 

0.14 

0.29 

MaleicAnhydridc 

Chemical 

14 

-0.07 

0.03 

0.11 

0.73 

Urea 

Chemical 

12 

-0.06 

0.03 

0.09 

0.04 

Electric.Range 

Cons. Goods 

22 

-0.02 

0.03 

0.04 

-0.14 

PhthalicAiihy di ide 

Chemical 

18 

-0.08 

0.03 

0.15 

0.31 

CarboiiBlack 

Chemical 

9 

-0.01 

0.03 

0.02 

-1.00 

Titanium.Dioxide 

Chemical 

9 

-0.04 

0.04 

0.05 

-0.41 

Primary. Aluminum 

Chemical 

40 

-0.02 

0.06 

0.08 

0.39 

Sorbitol 

Chemical 

8 

-0.03 

0.06 

0.05 

-1.00 

Aluminum 

Chemical 

17 

-0.02 

0.09 

0.04 

0.73 

Free. Standing. Gas.Range 

Cons. Goods 

22 

-0.01 

0.10 

0.04 

-0.30 

CarbonDisulfide 

Chemical 

10 

-0.03 

0.12 

0.06 

-0.04 

Ethanol..Brazil. 

Energy 

25 

-0.05 

0.13 

0.22 

-0.62 

Refined. C ane .Sugar' 

Food 

34 

-0.01 

0.23 

0.06 

-1.00 

CCGT.Power 

Energy 

10 

-0.04 

0.25 

0.15 

-1.00 

HydrofluoricAcid 

Chemical 

11 

-0.01 

0.25 

0.04 

0.13 

SodiumHydrosulfite 

Chemical 

9 

-0.01 

0.29 

0.07 

-1.00 

Coin..US. 

Food 

34 

-0.02 

0.30 

0.17 

-1.00 

Onshore. Gas.Pipeline 

Energy 

14 

-0.02 

0.31 

0.14 

0.62 

Mot or. G asolinc 

Energy 

23 

-0.00 

0.47 

0.05 

0.43 

Magnesium 

Chemical 

19 

-0.00 

0.47 

0.04 

0.58 

Crude.Oil 

Energy 

23 

0.01 

0.66 

0.07 

0.63 

Nuclear. Electricity 

Energy 

20 

0.13 

0.99 

0.22 

-0.13 


Table 1: Descriptive statistics and parameter estimates 
(using the full sample) for all available technologies. They 
are ordered by the p-value of a one-sided t-test for ft, i.e. 
based on how strong the evidence is that they are improv¬ 
ing. The improvement of the last 13 technologies is not 
statistically significant and so they are dropped from fur¬ 
ther analysis - see the discussion in the text. 
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m and a tilde to denote the estimates made using the 
full sample). Histograms of jlj, Kj, sample size Tj 
and 9j are giverl^ in Fig. [3l 
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Figure 3: Histogram for the estimated parameters for each 
technology i based on the full sample (see also Table [T]) . 
fLj is the annual logarithmic rate of decrease in cost, Kj 
is the standard deviation of the noise, Tj is the number of 
available years of data and 9j is the autocorrelation. 


3.3 Relation between drift and volatility 

Fig. m shows a scatter plot of the estimated stan¬ 
dard deviation Kj for technology j vs. the esti¬ 
mated improvement rate —jlj. A linear fit gives K = 
0.02 — 0.76/i with = 0.87 and standard errors of 
0.008 for the intercept and 0.04 for the slope, as shown 
in the figure. A log-log fit gives K = 
with = 0.73 and standard errors for the scaling 
constant of 0.18 and for the exponent of 0.06. This 
indicates that on average the uncertainty Kj gets big¬ 
ger as the improvement rate —fij increases. There is 
no reason that we are aware of to expect this a priori. 
One possible interpretation is that for technological 
investment there is a trade-off between risk and re¬ 
turns. Another possibility is that faster improvement 
amplifies fluctuations. 


^®The 6j are estimated by maximum likelihood letting jiMLE 
be different from jl. 



Figure 4: Scatter plot of the estimated standard deviation 
Kj for technology j against its estimated improvement rate 
—jlj. The dashed line shows a linear fit (which is curved 
when represented in log scale); the solid line is a log-log 
fit. Technologies with a faster rate of improvement have 
higher uncertainty in their improvement. 


4 Estimation procedures 

4.1 Statistical validation 

We use hindcasting for statistical validation, i.e. for 
each technology we pretend to be at a given date in 
the past and make forecasts for dates in the future rel¬ 
ative to the chosen datJ^. We have chosen this pro¬ 
cedure for several reasons. First, it directly tests the 
predictive power of the model rather than its good¬ 
ness of fit to the data, and so is resistant to overfitting. 
Second, it mimics the same procedure that one would 
follow in making real predictions, and third, it makes 
efficient use of the data available for testing. 

We fit the model at each time step to the m most 
recent changes in cost (i.e. the most recent m -|- 1 
years of data). We use the same value of m for all 
technologies and for all forecasts. Because most of 
the time series in our dataset are quite short, and 
because we are more concerned here with testing the 
procedure we have developed rather than with making 
optimal forecasts, unless otherwise noted we choose 
m = 5. This is admittedly very small, but it has the 
advantage that it allows us to make a large number of 

This method is also sometimes called backtesting and is a 
form of cross-validation. 
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forecasts. We will return later to discuss the question 
of which value of m makes the best forecasts. 

We perform hindcasting exhaustively in the sense 
that we make as many forecasts as possible given the 
choice of m. For technology j, the cost data yt = 
\og pt exists in years t = 1, 2,... , Tj. We then make 
forecasts for each feasible year and each feasible time 
horizon, i.e. we make forecasts yto+rito) rooted in 
years Iq = {m + 1,..., Tj — 1) with forecast horizon 
T={l,...,Tj - to). 



I 


Figure 5: Data available for testing as a function of the 
forecast time horizon. Here # of technologies refers to 
the number of technology time series that are long enough 
to make at least one forecast at a given time horizon r, 
which is measured in years. Similarly # of forecasts refers 
to the total number of forecasts that can be made at time 
horizon r. The horizontal line at r = 20 years indicates our 
(somewhat arbitrary) choice of a maximum time horizon. 

Since our dataset includes technology time series of 
different length (see Table [T] and Fig. [2]) the number 
of possible forecasts that can be made with a given 
historical window m is highest for r = 1 and de¬ 
creases for longer horizonq^. Fig. [5] shows the total 
number of possible forecasts that can be made with 
our dataset at a given horizon r and the number of 

The number of possible forecasts that can be made using 
a technology time series of length Tj is \Tj — (m + l)][rj — m]/2 
which is 0{Tf). Hence the total number of forecast errors con¬ 
tributed by a given technology time series is disproportionately 
dependent on its length. However, we have checked that aggre¬ 
gating the forecast errors so that each technology has an equal 
weight does not qualitatively change the results. 


technology time series that are long enough to make 
at least one forecast at horizon r. This shows that 
the amount of available data decreases dramatically 
for large forecast horizons. We somewhat arbitrar¬ 
ily impose an upper bound of Tmax = 20, but find 
this makes very little difference in the results (see 
Appendix 1C.31) . There are a total of 8212 possible 
forecasts that can be made with an historical window 
of m = 5, and 6391 forecasts that can be made with 
r < 20. 

To test for statistical significance we use a surrogate 
data procedure (explained below). There are three 
reasons for doing this: The first is that, although we 
derived approximate formulas for the forecast errors 
in Eq. (fT^ and (IlSp . when 0 0 the approximation 

is not very good for m = 5. The second is that the 
rolling window approach we use for hindcasting im¬ 
plies overlaps in both the historical sample used to 
estimate parameters at each time to and overlapping 
intervals in the future for horizons with r > 1. This 
implies substantial correlation in the empirical fore¬ 
cast errors, which complicates statistical testing. The 
third reason is that, even if the formulas were exact, 
we expect hnite sample fluctuations. That is, with a 
limited number of technologies and short time series, 
we do not expect to find the predicted result exactly; 
the question is then whether the deviation that we 
observe is consistent with what is expected. 

The surrogate data procedure estimates a null dis¬ 
tribution for the normalized mean squared forecast 
error under the hypothesized model. This is done by 
simulating both the model and the forecasting proce¬ 
dure to create a replica of the dataset and the fore¬ 
casts. This is repeated for many different realizations 
of the noise process in order to generate the null dis¬ 
tribution. More specifically, for each technology we 
generate Tj pseudo cost data points using Eq. m 
with fj, = flj, K = Kj and a given value of 9, thereby 
mimicking the structure of the data set. We then esti¬ 
mate the parameters and perform hindcasting just as 
we did for the real data, generating the same number 
of forecasts and computing the mean squared forecast 
error. This process is then repeated many times with 
different random number seeds to estimate the distri¬ 
bution. This same method can be used to estimate 
expected deviations for any quantity, e.g. we also use 
this to estimate the expected deviation of the finite 
sample distribution from the predicted distribution of 
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forecast errors. 

4.2 Parameter estimation 

We estimate the mean and the variance for each tech¬ 
nology dynamically, using a rolling window approach 
to fit the parameters based on the m -|- 1 most recent 
data points. In each year to for which forecasts are 
made the drift is estimated as the sample mean of 
the first differences, 

Aio = - (^w-yi) = (17) 

m m 

i=to—m 

where the last equality follows from the fact that the 
sum is telescopic, and implies that only two points 
are needed to estimate the drift. The volatility is 
estimated using the unbiased estimatoJ^ 

1 io—1 

i=tQ—m 

This procedure gives us a variable number of forecasts 
for each technology j and time horizon r rooted at all 
feasible times Iq. We record the forecasting errors 
£to,T = yt+rito) — yt+rito) and the associated values 
of Kto for all to and all r where we can make forecasts. 

The autocorrelation parameter 9 for the generalized 
model has to be treated differently. Our time series 
are simply too short to make reasonable rolling win¬ 
dow, technology-specific estimates for 9. With such 
small values of m the estimated autocorrelations are 
highly unreliable. 

Our solution is to use a global value of 9, i.e. we 
use the same value for all technologies and all points 
in time. It may well be that 9 is technology specific, 
but given the short amount of data it is necessary 
to make a choice that performs well under forecast¬ 
ing. This is a classic bias-variance trade-off, where 
the variance introduced by statistical estimation of a 
parameter is so large that the forecasts produced by 
a biased model with this parameter fixed are supe¬ 
rior. With very long time series this could potentially 

^^This is different from the maximum likelihood estimator, 
which does not make use of Bessel’s correction (i.e. dividing 
by (m — 1) instead of m). Our choice is driven by the fact 
that in practice we use a very small m, making the bias of the 
maximum likelihood estimator rather large. 


be avoided. This procedure seems to work well. It 
leaves us with a parameter that has to be estimated 
in-sample, but since this is only one parameter esti¬ 
mated from a sample of more than 6,000 forecasts the 
resulting estimate should be reasonably reliable. 

Evidence concerning autocorrelations is given in 
Fig. [3l where we present a histogram for the values 
of 9j for each technology j based on the full sample. 
The results are highly variable. Excluding eight likely 
outliers where 9j = ±1, the mean across the sample 
is 0.27, and 35 out of the remaining 45 improving 
technologies have positive values of 9j. This seems to 
suggest that 9 tends to be positive. 

We use two different methods for estimating a 
global value of 9. The first method takes advantage 
of the fact that the magnitude of the forecast errors 
is an increasing function of 9 (we assume 0 > 0) and 
chooses 9m {m as in “matched”) to match the empir¬ 
ically observed forecast errors, leading to 9m = 0.63 
as described in the next section. The second method 
takes a weighted average 9^ (w as in “weighted”) cal¬ 
culated as follows. We exclude all technologies for 
which the estimate of 9 reveals specification or esti¬ 
mation issues (0 ~ 1 or 0 ~ ~1)- Then at each hori¬ 
zon we compute a weighted average, with the weights 
proportional to the number of forecasts made with 
that technology. Finally we take the average of the 
first 20 horizon-specific estimated values of 0, leading 
to 9w = 0.25. See Appendix [Dj 

5 Comparison of models to data 

In comparing the model to data we address the fol¬ 
lowing five questions: 

1. Is the scaling law for the increase in forecasting 
errors as a function of time derived in Eqs. ([8]) 
and (I14h consistent with the data? 

2. Does there exist a value of 0 such that the null 
hypothesis of the model is not rejected? If so, 
what is this value, and how strong is the evidence 
that it is positive? 

3. When the normalized errors for different tech¬ 
nologies at different time horizons are col¬ 
lapsed onto a single distribution, does this agree 
with the Student distribution as predicted by 
Eq. (fT5ll? 
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4. Do the errors scale with the trailing sample size 
m as predicted under the assumption that the 
random process is stationary (i.e. that parame¬ 
ters are not changing in time)? 

5. Is the model well-specified? 

We will see that we get clear affirmative answers to 
the first four questions but we are unable to answer 
question (5). 

5.1 Normalized forecast errors as a func¬ 
tion of r 

To answer the first question we compute the sample 
estimate of the mean squared normalized forecast er¬ 
ror H(t), averaging over all available forecasts for all 
technologies at each time horizon with r < 20 (see 
Eq. (1141) 1. Fig. [6] compares the empirical results to 
the model with three different values of the autocor¬ 
relation parameter 6. Because the approximate error 
estimates derived in Eq. (fT4l) break down for small val¬ 
ues of m, for each value of 8 we estimate the expected 
mean squared errors under the null hypothesis of the 
model via the surrogate data procedure described in 
Section I4.l[p^ . 

The model does a good job of predicting the scal¬ 
ing of the forecast errors as a function of the time 
horizon r. The errors are predicted to grow approxi¬ 
mately proportional to (r -|- r^/m); at long horizons 
the error growth at each value of 9 closely parallels 
that for the empirical forecasts. This suggests that 
this scaling is correct, and that there is no strong 
support for modifications such as long-memory that 
would predict alternative rates of error growth. 

Using 6jn = 0.63 gives a good match to the em¬ 
pirical data across the entire range of time horizons. 
Note that even though we chose 6^ in order to get the 
best possible match, given that we are rescaling data 
for different technologies by the empirically measured 
sample standard deviations over very short samples 
of length m = 5, and that we are predicting across 20 
different time horizons simultaneously, the ability to 
hnd a value of the parameter 6 that matches this well 
was far from guaranteed. (It is completely possible, 
for example, that there would simply not exist a value 
of 0 < 1 yielding errors that were sufficiently large). 

^^When 6 — 0 the simulated and analytical results are visu¬ 
ally indlstiguishable. Fig.|S]uses the analytical formula, Eq. ([5|l. 



Figure 6: Growth of the mean squared normalized forecast 
error S(r) for the empirical forecasts compared to predic¬ 
tions using different values of 9. The empirical value of 
the normalised error S(r) is shown by black dots. The 
grey area corresponds to the 95% confidence intervals for 
the case 9 = 9m- The dashed line represents the predicted 
squared normalized error with 0 = 0, the dot-dash line is 
for 9yj = 0.25 and the solid line is for 9m = 0.63. 


To test the statistical significance of the results for 
different values of 6 and r we use the surrogate data 
procedure described at the end of Section 14.11 For 
8m = 0.63 we indicate error bars by showing in grey 
the region containing the 95% of the simulated real¬ 
izations with errors closest to the mean. For r = 1 
and r = 2 the predicted errors are visibly below the 
empirical observations, but the difference is within the 
error bars (though on the edge of the error bars for 
r = I); the agreement is very good at all other val¬ 
ues of r. The autocorrelation parameter 8^ = 0.25 
is weakly rejected for r between 1 and 6 and weakly 
accepted elsewhere, indicating that it is very roughly 
the lowest value of 8 that is consistent with the data 
at the two standard deviation level. In contrast the 
case 0 = 0, which gives normalized error predictions 
that are lower by about a factor of two, is clearly 
well outside of the error bars (note the logarithmic 
scale). This strongly indicates that a positive value of 
8 is required to match the observed errors, satisfying 
8>8y, = 0.25. 
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5.2 Distribution of forecast errors 

We now address question (3) by testing whether we 
correctly predict the distribution of forecast errors. 
Fig. [7] shows the distribution of rescaled forecast er¬ 
rors using Ora = 0.63 with Eq. (fT5|l to rescale the 
errors. Different values of r are plotted separately, 
and each is compared to the predicted Student dis¬ 
tribution. Overall, the fit is good but at longer hori¬ 
zons forecast errors tend to be positive, that is, re¬ 
alized technological progress is slightly slower than 
predicted. We have tested to see if this forecast bias 
is significant, and for r < 11 we cannot reject the 
null that there is no bias even at the 10% level. At 
higher horizons there is evidence of forecast bias, but 
we have to remember that at these horizons we have 
much less data (and fewer technologies) available for 
testing. 



Figure 7: Cumulative distribution of empirical rescaled 
normalized forecast errors at different forecast horizons t. 
The forecast errors for each technology j are collapsed us¬ 
ing Eq. m with 9 = 9m = 0.63. This is done for each 
forecast horizon r = 1, 2,..., 20 as indicated in the leg¬ 
end. The green thick curve is the theoretical prediction. 
The positive and negative errors are plotted separately. 
For the positive errors we compute the number of errors 
greater than a given value X and divide by the total num¬ 
ber of errors to estimate the cumulative probability and 
plot in semi-log scale. For the negative errors we do the 
same except that we take the absolute value of the error 
and plot against —X. 

Fig. [8]shows the empirical distribution with all val¬ 


ues of r pooled together, using rescalings correspond¬ 
ing to 0 = 0, 0^, and 6m- The predicted distribution 
is fairly close to the theoretical prediction, and as ex¬ 
pected the fit with 6m = 0.63 is better than with 
6^ = 0.25 or 0 = 0. 



Figure 8: Cumulative distribution of empirical rescaled 
normalized forecast errors with all r pooled together for 
three different values of the autocorrelation parameter, 
9 = 0 (dashed line), 9 = 0.25 (dot-dash line) and 9 = 0.63 
(solid line). See the caption of Fig. [7] for a description of 
how the cumulative distributions are computed and plot¬ 
ted. 

To test whether the observed deviations of the em¬ 
pirical error distribution from the predicted distribu¬ 
tion are significant we once again use the surrogate 
data approach described at the end of Section 14.11 
As before we generate many replicas of the dataset 
and forecasts. For each replica of the dataset and 
forecasts we compute a set of renormalized errors e* 
and construct their distribution. We then measure 
the average distance between the surrogate distribu¬ 
tion and the Student distribution as described in Ap¬ 
pendix [El Repeating this process 10,000 times results 
in the sampling distribution of the deviations from the 
Student distribution under the null hypothesis that 
the model is correct. We then compare this to the cor¬ 
responding value of the average distance between the 
real data and the Student distribution, which gives 
us a p-value under the null hypothesis. We find that 
the model with dm = 0.63 is accepted. In contrast 
dro = 0.25 is rejected with p-values ranging from 1% 


14 













to 0.1%, depending on the way in which the average 
distance is computed. The case with 0 = 0 is very 
strongly rejected. 

These results make it clear that the positive auto¬ 
correlations are both statistically significant and im¬ 
portant. The statistical testing shows that 0 = 0.63 
provides a good estimate for the observed forecasting 
errors across a large range of time horizons, with nor¬ 
malized forecasting errors that are well-described by 
the Student distribution. 


5.3 Dependence on sample size m 

So far we have used only a small fraction of the data 
to make each forecast. The choice for the trailing 
sample of m = 5 was for testing purposes, allowing us 
to generate a large number of forecasts and test our 
method for estimating their accuracy. 

We now address the question of the optimal value 
of m. If the process is stationary in the sense that the 
parameters (/r, K, 9) are constant, one should always 
use the largest possible value of m. If the process 
is nonstationary, however, it can be advantageous to 
use a smaller value of m, or alternatively a weighted 
average that decays as it goes into the past. How 
stationary is the process generating technology costs, 
and what is the best choice of ml 

We experimented with increasing m, as shown in 
Fig. I and compared this to the model with Om = 
0.63. We find that the errors drop as m increases 
roughly as one would expect if the process were sta- 
tionar50 and that the model does a reasonably good 
job of forecasting the errors (see also Appendix 1C. 1(1 . 
This indicates that the best choice is the largest possi¬ 
ble value of m, which in this case is m = 16. However 
we should emphasize that it is entirely possible that 
testing on a samj^ with longer time series might yield 
an optimal valum^l of m > 16. 


Note that to check forecast errors for high m we have used 
only technologies for which at least m -f 2 years were available. 
For large values of m the statistical variation increases due to 
lack of data. 

We present the results up to m = 16 because less than a 
third of the technologies can be used with larger sample sizes. 
We have performed the same analysis up to m = 35, where only 
5 technologies are left, and the results remain qualitatively the 
same. 



Forecast horizon t 

Figure 9: Mean squared normalized forecast error "E. as a 
function of the forecast horizon t for different sizes of the 
trailing sample size m. This is done for m = (4,8,12,16), 
as shown in the legend. The corresponding theoretical 
predictions are made using 9m = 0.63, and are shown as 
solid curves ordered in the obvious way from top (m = 4) 
to bottom (to = 16). 

5.4 Is the model well-specified? 

Because most of our time series are so short it is diffi¬ 
cult to say whether or not the model is well-specified. 
As already noted, for such short series it is impossible 
to usefully estimate technology-specific values of the 
parameter 9, which has forced us to use a global value 
for all technologies. Averaging over the raw samples 
suggests a relatively low value 9^ = 0.25, but a much 
higher value 9^. = 0.63 is needed to match the empir¬ 
ically observed errors. However we should emphasize 
that with such short series 9 is poorly estimated, and 
it is not clear that averaging across different technolo¬ 
gies is sufficient to fix this problem. 

In our view it would be surprising if there are not 
technology-specific variations in 9; after all fj,j and Kj 
vary significantly across technologies. So from this 
point of view it seems likely that the model with a 
global 9 is mis-specified. It is not clear whether this 
would be true if we were able to measure technology- 
specific values of 9j . It is remarkable that such a sim¬ 
ple model can represent a complicated process such as 
technological improvement as well as it does, and in 
any case, as we have shown, using 9 = 9m does a good 
job of matching the empirically observed forecasting 
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errors. Nonetheless, testing with more data is clearly 
desirable. 

6 Application to solar PV modules 


We make the forecast using Eq. (fT6ll . We use all 
available years of past data (m = 33) to fit the pa¬ 
rameters fts = P'S = —0.10 and Ks = Ks = 0.15, 
and we used 6 = Om = 0.63. The forecast is given 
by Eq. (1161) with appropriate substitutions of param¬ 
eters, i.e. 


In this section we provide a distributional forecast for 
the price of solar photovoltaic modules. We then show 
how this can be used to make a comparison to a hypo¬ 
thetical competing technology in order to estimate the 
probability that one technology will be less expensive 
than another at a given time horizon. 

6.1 A distributional forecast for solar en¬ 
ergy 

We have shown that the autocorrelated geometric ran¬ 
dom walk can be used to forecast technological cost 
improvement and that the formula we have derived for 
the distribution of forecast errors works well when ap¬ 
plied to many different technologies. We now demon¬ 
strate how this can be used to make a distributional 
forecast for the cost improvement of a given technol¬ 
ogy. The fact that the method has been extensively 
tested on many technologies in the previous section 
gives us some confidence that this forecast is reliable. 


to 

LO 



1980 1986 1992 1998 2004 2010 2016 2022 2028 


Figure 10: Forecast for the cost of photovoltaic modules in 
2013 $/Wp. The point forecasts and the error bars are pro¬ 
duced using Eq. (IT^ and the parameters discussed in the 
text. Shading indicates the quantiles of the distribution 
corresponding to 1, 1.5 and 2 standard deviations. 


ysit -b r) ~ J\f{ysit) + psr, K^A*/{1 + 9m^)), (19) 


where A*{6m) is defined in Eq. (fT3]) . Fig.fTOlshows the 
predicted distribution of likely prices for solar photo¬ 
voltaic modules for time horizons up to 2030. The 
intervals corresponding to plus or minus two standard 
deviations in Fig. [TO] are 95% prediction intervals. 

The prediction says that it is likely that solar PV 
modules will continue to drop in cost at the roughly 
10% rate that they have in the past. Nonetheless 
there is a small probability (about 5%) that the price 
in 2030 will be higher than it was in 2013 0 While 
it might seem remarkable to forecast 15 years ahead 
with only 33 years of past data, note that throughout 
most of the paper we were forecasting up to 20 years 
ahead with only six years of data. As one uses more 
past data, the width of the distributional forecast de¬ 
creases. In addition there are considerable variations 
in the standard deviations Kj of the technologies in 
Table (TJ these variations are reflected in the width of 
the distribution at any given forecasting horizon. The 
large deviation from the trend line that solar module 
costs made in the early part of the millennium cause 
the estimated future variance to be fairly large. 

Except for the estimation of 0 no data from other 
technologies was used in this forecast. Nonetheless, 
data from other technologies were key in giving us 
confidence that the distributional forecast is reliable. 


T his forecast i s cons istent with the one made several years 
ago bv lNagy et all (l2013l i using data only until 2009. It is dif¬ 
ficult to compare this forecast with expert’s elicitation studies, 
which are often more precise in terms of which PV technology 
and which market is predicted and are often concerned with 
levelized costs. Individu al experts’ dis t ributi onal predictions 
for LCOE (see Fig. 6 in iBosetti et all (l2012h ) seem tight as 
compared to ours (for modules only). However, the predic¬ 
tions for the probability tha t PV will cost less tha n $0.30/Wp 
in 2030 reported in Fig.3 of ICurtright et al] ll2008t) are overall 
comparable with ours. 
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6.2 Estimating the probability that one 
technology will be less expensive than 
another 

Suppose we want to compute the probability that a 
given technology will be less expensive than another 
competing technology at a given point in the future. 
We illustrate how this can be done by comparing the 
log cost of photovoltaic modules ys with the log cost 
of a hypothetical alternative technology yc- Both the 
cost of photovoltaic modules and technology C are as¬ 
sumed to follow Eq. m, but for the sake of argument 
we assume that, like coal, technology C has histori¬ 
cally on average had a constant cost, i.e. yc = 0. We 
also assume that the estimation period is the same, 
and that Oc = Og = Om- We want to compute the 
probability that r steps ahead ys < yc- The proba¬ 
bility that ys < yc is the probability that the random 
variable Z = yc — ys positive. Since ys and yc are 
normal, assuming they are independent their differ¬ 
ence is normal, i.e. 

Z ^ J\f {yz,(^z): 



Figure 11: Probability that solar photovoltaic modules be¬ 
come less expensive than a hypothetical competing technol¬ 
ogy C whose initial cost is one third that of solar but is 
on average not improving, i.e. yc = 0. The curves show 
Eq. ((^ using ys = —0.10, Ks = 0.15, m = 33 for solar 
PV and three different values of the noise parameter Kc 
for technology C. The crossing point is at r ~ 11 (2024) 
in all three cases. 


where yz = (yep) - ysiP) + T{yc - ys) and a% = 
{A*/{I -\- 9‘p)){K‘g -h Kp). The probability that ys < 
yc is the integral for the positive part, which is ex¬ 
pressed in terms of the error function 


Pr{ys < yc) 



( 20 ) 


In Fig. [TT] we plot this function using the param¬ 
eters estimated for photovoltaics, assuming that the 
cost of the competing technology is a third that of 
solar at the starting date in 2013, and that it is on 
average not dropping in cost, i.e. yc = 0. We con¬ 
sider three different levels of the noise parameter Kc 
for technology C. Note that changing the noise pa¬ 
rameter does not change the expected time when the 
curves cross. 

The main point of this discussion is that with our 
method we can reliably forecast the probability that 
a given technology will surpass a competitor. 


6.3 Discussion of PV relative to coal-fired 
electricity and nuclear power 

In the above discussion we have carefully avoided dis¬ 
cussing a particular competing technology. A forecast 
for the full cost of solar PV electricity requires pre¬ 
dicting the balance of system costs, for which we lack 
consistent historical data, and unlike module costs, 
the full cost depends on factors such as insolation, in¬ 
terest rates and local installation costs. As solar PV 
grows to be a significant portion of the energy sup¬ 
ply the cost of storage will become very important. 
Nonetheless, it is useful to discuss it in relation to the 
two competitors mentioned in the introduction. 

An analysis of coal-fired electricity, breaking down 
costs into their components and exa mining each of the 


trends separately, has been made bv lMcNernev et al. 


(1201 iD . They show that while coal plant costs (which 
are currently roughly 40% of total cost) dropped his¬ 
torically, this trend reversed circa 1980. Even if the re¬ 
cent trend reverses and plant construction cost drops 
dramatically in the future, the cost of coal is likely to 
eventually dominate the total cost of coal-fired elec¬ 
tricity. As mentioned before, this is because the his- 
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torical cost of coal is consistent with a random walk 
without drift, and currently fuel is about 40% of to¬ 
tal costs. If coal remains constant in cost (except for 
random fluctuations up or down) then this places a 
hard bound on how much the total cost of coal-fired 
electricity can decrease. Since typical plants have ef- 
hciencies the order of 1/3 there is not much room for 
making the burning of coal more efficient - even a 
spectacular efficiency improvement to 2/3 of the the¬ 
oretical limit is only an improvement of a factor of 
two, corresponding to the average progress PV mod¬ 
ules make in about 7.5 years. Similar arguments apply 
to oil and natural gas^d. 


Because historical nuclear power costs have tended 
to increase, not just in the US but worldwide, even 
a forecast that they will remain constant seems op¬ 
timistic. Levelized costs for solar PV powerplants 
in 2013 were as low as 0.078-0.142 E uro/kWh (0.09- 
0.16$) in Germany ( Kost et al. 2013l |^. and in 2014 
solar PV reached a new record low with an accepted 
bid of $0.06/kWh for a plant in DubaQ When these 
are compared to the projected cost of $0.14/kWh in 
2023 for the Hinkley Point nuclear reactor, it appears 
that the two technologies already have roughly equal 
costs, though of course a direct comparison is difficult 
due to factors such as intermittency, waste disposal, 
insurance costs, etc. 

As a final note, skeptics have claimed that solar PV 
cannot be ramped up quickly enough to play a signif¬ 
icant role in combatting global warming. A simple 
trend extrapolation of the growth of solar energy (PV 
and solar thermal) suggests that it could represent 


Though much has been made of the recent drop in the 
price of natural gas due to fracking, which has had a large effect, 
one should bear in mind that the drop is tiny in comparison 
to the factor of about 2,330 by which solar PV modules have 
dropped in price. The small change induced by fracking is only 
important because it is competing in a narrow price range with 
other fossil fuel technologies. In work with other collaborators 
we have examined not just oil, coal and gas, but more than a 
hundred minerals; all of them show remarkably flat historical 
prices, i.e. they all change by less than an order of magnitude 
over the course of a century. 

Levelized c osts decrease more slowly than module costs, 
but do decrease jNeme For instance, installation costs 

per watt have fallen in Germany and are now about half what 
they are in the U.S. llBarbose et al.|[2014l L 

See http://www.renewableenergyworld.com/rea/ 
news / article/2015/01/dubai-utility-dewa- procures-the- 

worlds-cheapest-solar-energy-ever 


20% of the energy consumption by 2027. fn contrast 
the "hi-Ren" (high renewable) scenario of the Interna¬ 
tional Energy Agency, which is presumably based on 
expert analysis, assumes that PV will generate 16% of 
total electricity in 2050. Thus even in their optimistic 
forecast they assume PV will take 25 years longer than 
the historical trend suggests (to hit a lower target). 
We hope in the future to formulate similar methods 
for forecasting production so that we can better assess 
the reliability of such forecasts. See Appendix IfI and 
Fig. [20] in particular. 


7 Conclusion 

Many technologies follow a similar pattern of progress 
but with very different rates. In this paper we have 
proposed a simple method based on the autocorre- 
lated geometric random walk to provide robust pre¬ 
dictions for technological progress that are stated as 
distributions of outcomes rather than point forecasts. 
We assume that all technologies follow a similar pro¬ 
cess except for their rates of improvement and volatil¬ 
ity. Under this assumption we can pool forecast er¬ 
rors of different technologies to obtain an empirical 
estimation of the distribution of forecast errors. 

One of the essential points of this paper is that the 
use of many technologies allows us to make a better 
forecast for a given technology, such as solar PV mod¬ 
ules. Although using many technologies does not af¬ 
fect our point forecast, it is the essential element that 
allowed us to test our distributional forecasts in order 
to ensure that they are reliable. The point is that by 
treating all technologies as essentially the same except 
for their parameters, and collapsing all the data onto 
a single distribution, we can pool data from many 
technologies to gain confidence in and calibrate our 
method for a given technology. It is of course a bold 
assumption to say that all technologies follow a ran¬ 
dom process with the same form, but the empirical 
results indicate that this a good hypothesis. 

We do not want to suggest in this paper that we 
think that Moore’s law provides an optimal forecast¬ 
ing method. Quite the contrary, we believe that by 
gathering more historical data, and by adding other 
auxiliary variables such as production, R&D, patent 
activity, there should be considerable room for im¬ 
proving forecasting power. In the future we antici- 
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pate that theories will eventually provide causal ex¬ 
planations for why technologies improve at such dif¬ 
ferent rates and this will result in better forecasts. 
Nonetheless, in the meantime the method we have 
introduced here provides a benchmark against which 
other approaches can be measured. It provides a proof 
of principle that technologies can be successfully fore¬ 
cast and that the errors in the forecasts can be reliably 
predicted. 

From a policy perspective we believe that our 
method can be used to provide an objective point of 
comparison to expert forecasts, which are often biased 
by vested interests and other factors. The fact that we 
can associate uncertainties with our predictions makes 
them far more useful than simple point forecasts. The 
example of solar PV modules illustrates that differ¬ 
ences in the improvement rate of competing technolo¬ 
gies can be dramatic, and that an underdog can begin 
far behind the pack and quickly emerge as a front¬ 
runner. Given the urgency of limiting greenhouse gas 
emissions, it is fortuitous that a green technology also 
happens to have such a rapid improvement rate, and 
is likely to eventually surpass its competition within 
10 — 20 years. In a context where limited resources 
for technology investment constrain policy makers to 
focus on a few technologies that have a real chance 
to eventually achieve and even exceed grid parity, the 
ability to have improved forecasts and know how ac¬ 
curate they are should prove particularly useful. 


Appendix 
A Data 


The data are mostly taken from the Santa-Fe 
Performance Curve DataBase, accessible at 
pcdb.santafe.edu. The database has been con- 
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photovoltaic prices has been collected from public 
releases of Strategies Unlimited, Navigant and SPV 
Mar ket Research. The d ata o n nu clear energy is 
from iKoomev fc HultmanI (120071 ) and [Cooper 


The DNA sequencing data is from IWetterstrand 
( 20151 ) (cost per human-size genome), and for each 
year we took the last available month (September for 
2001-2002 and October afterwards) and corrected for 
inflation using the US GDP deflator. 


B Distribution of forecast errors 


B.l Random walk with drift 


This section derives the distribution of forecast errors. 
Note that by definition yt+i — Vt = 

To obtain fi we assume m sequential independent ob¬ 
servations of Ay, and compute the average. The sam¬ 
pling distribution of the mean of a normal variable is 

fi ^ N{y,K‘^/m). ( 21 ) 

Moreover, nt J\f{0^K‘^) implies 

t+r 

Y, ( 22 ) 

i=t+l 

Using Eqs. m and (I22h in Eq. ([7]) we see that the 
distribution of forecast errors is Gaussian 

t+T 

£ = T{y.-fl)+ rii ^ J\f{0,K‘^A), (23) 

i=t+l 


where A = t + jm (llOp . Eq. [23] implies 


1 £ 
~7Tk 


A(0,1). 


(24) 


Eq. (|23]) leads to E[£'^\ = K‘ ^{t -|- r^/m), wh ich ap¬ 
pears in more general form in [Sampson] (1199 ll ). How¬ 
ever we also have to account for the fact that we have 
to estimate the variance. Since is the sample vari¬ 
ance of a normally distributed random variable, the 
following standard result holds 


(m — l)iF^ n, , , , 

^ ^2 -X^(m-l). (25) 

If Z ~ AA(0,1), U ~ X^(r), and Z and U are inde¬ 
pendent, then Z/\/U/r ~ t{r). Taking Z from Eq. 
(1241) . U from Eq. (|25p and assuming independence, 
we find that the rescaled normalized forecast errors 
have a Student t distribution 

~ t(m — 1). (26) 

Vak ^ ^ ^ ^ 
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Note that the t distribution has mean 0 but vari¬ 
ance df /{df — 2), where df are the degrees of freedom. 
Hence the expected squared rescaled normalized fore¬ 
cast error is 


E 



= 0 + Var 


1 £ 
^/AK 


m — 1 
m — 3' 


independent normal random variables. Hence we can 
obtain £ ~ AA(0, where 


A* = (—) +(m-l) 

\m J 


r(l -I- 6) 


m 


+ (e-—y + {T- i)(i + of + 1. 

V mJ 


leading to Eq. ([8]) in the main text. 


can be simplified as (I13p in the main text. 


B.2 Integrated Moving Average 

Here we derive the distribution of forecast errors given 
that the true process is an IMA (1,1) with known 0, 
and K are estimated assuming that the process is 
a random walk with drift, and the forecasts are made 
as if the process was a random walk with drift. First 
note that, from Eq. (ttH), 

t-\-T 

yt+r = yt + fJ.T+ bi + Ovi-i]. 

Using Eq. ([5)) to make the prediction implies that 



t+r 

£ = yt+r - yt+T = T{y- fi)+ ^ [vi + Ovi-i], 

i=t+l 


Now we can substitute 


t-i 


i=t—m 


A = — U ivi+i - Vi) = -U [Vi+I + Ov, 

i 

to obtain 

£ = — i- ^ [vi+i + Ovi] j -h E] 


t-1 


i=t—m 


t+T 


i-lj- 


\ i=t—m / i=t+\ 

Expanding the two sums, this can be rewritten 

t-i 

T[L + ti) 

Vt—m 

m 


tO T{i + e) ^ 

£ = - Vt-m -2^ Vi 

t+r—1 


m 


+ -j + (1 + Vi -|- Vt+r 

^ i=t+l 


Note that the term vt enters in the forecast error 
both because it has an effect on parameter estimation 
and because of its effect on future noise. Now that 
we have separated the terms we are left with a sum of 


Forecast horizon i 

Figure 12: Error growth for large simulations of a 
IMA (1,1) process, to check Eq. ([T^ and (11511 . Simulations 
are done using 5000 time series of 100 periods, all with 
with y = 0.04, K = 0.05, 6 = 0.6. The insets show the 
distribution of forecast errors, as in Fig. [8l for to = 5,40 

To obtain the results with estimated (instead of 
true) variance (Eq. (I14p and (|15p l. we follow the same 
procedure as in Appendix IB.11 which assumes inde¬ 
pendence between the error and the estimated vari¬ 
ance. Fig. [12] shows that the result is not exact but 
works reasonably well if m > 15. 

C Robustness checks 

C.l Size of the learning window 

As a check on the results presented in Section 15.31 
we test the dependence of the forecast errors on the 
sample window m for several different forecast hori¬ 
zons. The results are robust to a change of the size 
of learning window m. It is not possible to go below 
m = 4 because when m = 3 the Student distribution 
has TO — 1 = 2 degrees of freedom, hence an infinite 
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variance. Note that to make forecasts using a large 
m only the datasets which are long enough can be in¬ 
cluded. The results for a few values of m are shown 
in Fig. [9l Fig. [13] shows that the normalized mean 
squared forecast error consistently decreases as the 
learning window increases. 


4 6 8 10 12 14 16 4 6 8 10 12 14 16 




Figure 13: Empirical mean squared normalized forecast er¬ 
rors as a function of the size of learning window for dif¬ 
ferent forecast horizons. The dots are the empirical errors 
and the plain lines are those expected if the true model 
was an 1MA(1,1) with 9^ = 0.63. 


C.2 Data selection 


We have checked how the results change when about 
half of the technologies are randomly selected and re¬ 
moved from the dataset. The shape of the normalized 
mean squared forecast error growth does not change 
and is shown in Fig. [TT| The procedure is based on 
10000 random trials selecting half the technologies. 



Figure 14: Robustness to dataset selection. Mean squared 
normalized forecast errors as a function of r when using 
only half of the technologies (26 out 53), chosen at random. 
The 95% confidence intervals, shown as dashed lines, are 
for the mean squared normalized forecast errors when we 
randomly select 26 technologies. 

C.3 Increasing Tmax 



Forecast horizon t 


Figure 15: Robustness to increasing r^ax- Main results 
(i.e as in Fig. |6|and|8|) using Tmax = 73. We use 9 — 0 and 
9 = 0.63. 

In the main text we have shown the results for a fore¬ 
cast horizon up to Tmax = 20. Moreover, we have used 
only the forecast errors up to Tmax to construct the 
empirical distribution of forecast errors in Fig. [8] and 
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to estimate 6 in Appendix[Dj Fig. 1151 shows that if we 
use all the forecast errors up to the maximum with 
r = 73 the results do not change significantly. 

C.4 Heavy tail innovations 

To check the effect of non-normal noise increments on 
H(r) we simulated random walks with drift with noise 
increments drawn from a Student distribution with 3 
or 7 degrees of freedom. Fig. [16] shows that fat tail 
noise increments do not change the long horizon errors 
very much. While the IMA (1,1) model produces a 
parallel shift of the errors at medium to long horizons, 
the Student noise increments generate larger errors 
mostly at short horizons. Thus fat-tail innovations are 
not the most important source of discrepancy between 
the geometric random walk model and the empirical 
data. 



Forecast horizon x 

Figure 16: Effect of fat tail innovations on error growth. 
The figure shows the growth of the mean squared normal¬ 
ized forecast errors for four models, showing that introduc¬ 
ing fat tail innovations in a random walk with drift (RWD) 
mostly increases errors only at short horizons. 


D Procedure for selecting the auto¬ 
correlation parameter 9 

We select 8 in several ways. The first method is to 
compute a variety of weighted means for the 9j esti¬ 
mated on individual series. The main problem with 
this approach is that for some technology series the 




0 


X 


Figure 17: Estimation of 9 as a global parameter 



Figure 18: Using the IMA model to make better forecasts. 
The right panel uses 9 = 0.25 


estimated 9 was very close to 1 or -1, indicating mis- 
specification or estimation problems. After removing 
these 8 technologies the mean with equal weights for 
each technology is 0.27 with standard deviation 0.35. 
We can also compute the weighted mean at each fore¬ 
cast horizon, with the weights being equal to the share 
of each technology in the number of forecast errors 
available at a given forecast horizon. In this case the 
weighted mean will not necessarily be constant 

over time. Fig. [T7| (right) shows that 9w{t) oscillates 
between 0.24 and 0.26. Taking the average over the 
first 20 periods gives 9^ = = 0.25. 

When doing this we do not mean to imply that our 
formulas are valid for a system with heterogenous 9j] 
we simply propose a best guess for a universal 9. 

The second approach is to select 9 in order to match 
the errors. As before we generate many artificial data 
sets using the IMA(1,1) model. Larger values of 9 
imply that using the simple random walk model to 
make the forecasts will result in higher forecast er¬ 
rors. Denote by H(T)empi the empirical mean squared 
normalized forecast error as depicted in Fig. |6l and by 
H(r)sjm,0 the expected mean squared normalized fore¬ 
cast error obtained by simulating IMA(1,1) datasets 
3,000 times with a particular global value of 9 and 
taking the average. We study the ratio of these 
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two, averaged over all 1... Tmax = 20 periods, i.e. 
■^(^) = ^ The values are shown in 

Fig. [T7] (left). The value at which \Z —1\ is minimum 
is at 9m = 0.63. 

We also tried to make forecasts using the IMA 
model to check that forecasts are improved: which 
value of 6 allows the IMA model to produce better 
forecasts? We apply the IMA (1,1) model with differ¬ 
ent values of 6 to make forecasts (with the usual es¬ 
timate of the drift term (1) and study the normalized 
error as a function of 9. We record the mean squared 
normalized error and repeat this exercise for a range 
of values of 9. The results for horizons 1,2, and 10 are 
reported in Fig. [18] (left). This shows that the best 
value of 9 depends on the time horizon r. The curve 
shows the mean squared normalized forecast error at 
a given forecast horizon as a function of the value of 
9 assumed to make the forecasts. The vertical lines 
show the minima at 0.26, 0.40, and 0.66. Given that 
the mean squared normalized forecast error increases 
with T, to make the curves fit on the plot the val¬ 
ues are normalized by the mean squared normalized 
forecast error using 0 = 0. We also see that as the 
forecast horizon increases the improvement from tak¬ 
ing the autocorrelation into account decreases (Fig. 
[THl right), as expected theoretically from an IMA pro¬ 
cess. Note that the improvement in forecasting error 
is only a few percent, even for t = 1, indicating that 
this makes little difference. 


compared to the predicted Student distribution by 
computing the difference = Pk — tk between the 
surrogate distribution and the Student distribution in 
each interval. We measure the overall deviation be¬ 
tween the surrogate and the Student using three dif¬ 
ferent measures of deviation: |Afc|, and 

maxAfc. We then repeat this process 10,000 times to 
generate a histogram for each of the measures above, 
and compare this to the measured value of the devia¬ 
tion for the real data. 

Results for doing this for 9^ = 0.25 and 9m = 0.63 
are reported in Fig. [T9| For 9^ the resulting p-values 
(the shares of random datasets with a deviation higher 
than the empirical deviation) are (0.001,0.002,0.011) 
respectively using (Ylkl^kl, maxA^) to 

measure the deviation. In contrast for 9m = 0.63 
the p-values are (0.21,0.16,0.20). Thus 9m = 0.63 is 
accepted and 9^ = 0.25 is rejected. The uncorrelated 
case 0 = 0 is rejected even more strongly. 



2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 0.00 0.04 0.08 



E Comparison of the empirical dis¬ 
tribution of rescaled errors to the 
predicted Student distribution 

In this section we check whether the deviations of the 
empirical forecast errors from the predicted theoret¬ 
ical distribution shown in Fig. |8] are consistent with 
statistical sampling error. For a given value of 0 we 
generate a surrogate data set and surrogate forecasts 
mimicking our empirical data as described at the end 
of Section KT[ We then construct a sample surrogate 
(cumulative) distribution Pk for the pooled rescaled 
errors e* of Eq. m- We measure the distribution 
Pk over 1,000 equally spaced values Xk on the inter¬ 
val [—15; 15]. Pk is estimated by simply counting the 
number of observations less than Xk- This is then 


Figure 19: Expected deviations of the distribution of the 
rescaled variable e* of Eq. HW from the Student distri¬ 
bution for hindcasting experiments as we do here using a 
dataset with the same properties as ours. The histograms 
show the sampling distribution of a given statistic and the 
thick black line shows the empirical value on real data. The 
simulations use 9 — 0.25 (3 upper panels) and 9 = 0.63 (3 
lower panels). 

F A trend extrapolation of solar 
energy capacity 

In this paper we have been concerned with forecast¬ 
ing costs. For some applications it is also useful to 
forecast production. Our exploratory work so far sug¬ 
gests that, while the same basic methods can be ap- 


23 









































plied, production seems more likely to deviate sys- 
tem atically from incre asing exponentially. Nonethe¬ 
less, Nagy et al. ( 2013h found that as a rough approx¬ 
imation most of the technologies in our data set can 
be crudely (but usefully) approximated as having ex¬ 
ponentially increasing production for a long span of 
their development cycle, and solar PV is no excep¬ 
tion. Trend extrapolation can add perspective, even 
if it comes without good error estimates, and the ex¬ 
ample we present below motivates the need for more 
work to formulate better methods for assessing the re- 
liability of production forecasts (for an example, see 
Shlvakhter et al. ( 19941 )). 


Many analysts have expressed concerns about the 
time required to build the needed capacity for solar 
energy to play a role in reducing greenhouse gas emis¬ 
sions. The "hi-Ren" (high renewable) scenario of the 
International Energy Agency assumes that PV will 
generate 16% of total electricitjl^ in 2050; this was 
recently increased from the previous estimate of only 
11%. As a point of comparison, what do past trends 
suggest? 

Though estimates vary, over the last ten years cu¬ 
mulative installed capacity of PV has grown at an 
impressive rate. According to BP’s Statistical Review 
of World Energy 2014, during the period from 1983- 
2013 solar energy as a whole grew at an annual rate of 
42.5% and in 2014 represented about 0.22% of total 
primary energy consumption, as shown in Fig.[20l By 
comparison total primary energy consumption grew 
at an annual rate of 2.6% over the period 1965-2013. 
Given that solar energy is an intermittent source, it is 
much easier for it to contribute when it supplies only a 
minority of energy: new supporting technologies will 
be required once it becomes a major player. If we 
somewhat arbitrarily pick 20% as a target, assuming 
both these trends continue unaltered, a simple calcu¬ 
lation shows that this would be achieved in about 13.7 
vearJ^. That is, under these assumptions in 2027 so¬ 
lar would represent 20% of energy consumption. Of 
course this is only an extrapolation, but it puts into 
perspective claims that solar energy cannot play an 
essential role in mitigating global warming on a rela¬ 
tively short timescale. 


Electricity generation uses about 40% of the world’s pri¬ 
mary energy but is expected to grow significantly. 

In this deterministic setting, the time to meet this goal is 
the solution for t of 0.0022(1.425)* = 0.2(1.026)*. 



Figure 20: Global energy consumption due to each of the 
major sou rces from BP Statistical Review of World Energy 
( BPll20l3 l. Under a projection for solar energy obtained 
by fitting to the historical data the target of 20% of global 
primary energy is achieved in 2027. 

Of course the usual caveats apply, and the limita¬ 
tions of such forecasting is evident in the historical 
series of Fig. [20l The increase of solar is far from 
smooth, wind has a rather dramatic break in its slope 
in roughly 1988, and a forecast for nuclear power made 
in 1980 based on production alone would have been 
far more optimistic than one today. It would be in¬ 
teresting to use a richer economic model to forecast 
cost and production simultaneously, but this is be¬ 
yond the scope of this paper. The point here was 
simply to show that if growth trends continue as they 
have in the past significant contributions by solar are 
achievable. 
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